---
title: "Preparing arvig -- Anti-Refugee Violence in Germany"
author: "Valentina Gonzalez Rostani"
date: "2024-08-21"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Preamble

*  Contact mag384@pitt.edu
* Version: R 3.4.3.1

This Rmarkdown:

- Creates a file with the data for hate incidents by region in Germany. 



Input:

- Download arvig data from the package
  

Output:

- Data/final_aggregated_data.dta   It is a dta file with the count of hate incidents per geographic area. 




## What is arvig?

This file contains the subset from the R data package arvig, which contains a georeferenced dataset on categorised events of anti-refugee violence and social unrest in Germany from 2014 onwards. The period analyzed is the year before the election Sept-2019 to Sept 2017 

`arvig` is based on information published by the civil society project [Mut Gegen Rechte Gewalt](https://www.mut-gegen-rechte-gewalt.de). The data is further presented (until 2016) in a background paper  ["Refugees Welcome? A Dataset on Anti-Refugee Violence in Germany" in *Research \& Politics* **3**(4)](http://doi.org/10.1177/2053168016679590).


## Installation

You can install `arvig` from GitHub with

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("davben/arvig")
```

Once you have the file installed you can use the dataset via

```{r usage}
library(arvig)
library(tidyverse)
library(sf)

data("arvig")
```

## Preparing the data

I will count the events, and create a single variable called anti, which refers to the count of incidents. 

```{r frequency-table}
data(arvig, package = "arvig")
arvig$count<-1
arvig <- arvig %>%
  mutate(attack = case_when(
    category_en %in% c("arson & miscellaneous attack", 
                       "demonstration & miscellaneous attack", 
                       "assault", 
                       "miscellaneous attack", 
                       "miscellaneous attack & assault", 
                       "arson", 
                       "other") ~ 1,
    TRUE ~ 0 # Assigns 0 if none of the conditions above are met
  ))

arvig <- arvig %>%
  mutate(demonstration = case_when(
    category_en %in% c("demonstration", 
                       "demonstration & miscellaneous attack", 
                       "suspicion") ~ 1,
    TRUE ~ 0 # Assigns 0 if none of the conditions above are met
  ))



```

## Common format to merge

To merge the data with the other sources I need to modify community_id and call it kreis. 

```{r}
arvig$kreis <- ifelse(substr(arvig$community_id, 1, 1) == "0", 
                           substr(arvig$community_id, 2, 5), 
                           substr(arvig$community_id, 1, 5))
```


```{r}

arvig$new_column <- as.numeric(substr(arvig$community_id, 1, 5))
# Check data types for specific columns
sapply(arvig[c("kreis", "count", "attack", "demonstration")], class)
# Ensure we're working with the correct data types
arvig$kreis <- as.numeric(arvig$kreis)
arvig$anti <- as.numeric(arvig$count)

arvig$kreis[arvig$kreis == 3159] <- 3158
arvig$kreis[arvig$kreis == 11000] <- 11100
arvig$state_state<-arvig$state

```

## Subsetting for the period of interest 1 year previous to the election

```{r}
# Ensure that the 'date' column is in Date format
arvig$date <- as.Date(arvig$date)


subset_arvig <- subset(arvig, date >= as.Date("2016-09-01") & date <= as.Date("2017-10-01") )

```

## Collapsing by geographic unit

Since the data is a list of events I am aggregating it so I can have an indicator per region. 

```{r}
# Aggregating numeric variables
agg_data <- aggregate(cbind(anti, attack, demonstration) ~ kreis, data = subset_arvig, sum, na.rm = TRUE)

arvig_sorted <- subset_arvig[order(subset_arvig$kreis), ]
first_instances <- arvig_sorted[!duplicated(arvig_sorted$kreis), c("kreis", "community_id", "longitude", "latitude", "state_state", "location")]

# Merging aggregated numeric data with first instances of other variables



final_aggregated_data <- merge(agg_data, first_instances, by = "kreis")

# Replace specific 'kreis' values



```


## Saving the data I will use to merge with others sources

```{r}

# Get the directory of the current Rmd file
current_dir <- dirname(rstudioapi::getActiveDocumentContext()$path)

# Define the path to the "Publication" folder
publication_folder <- normalizePath(file.path(current_dir, ".."))

# Set the working directory to the "Publication" folder
setwd(publication_folder)

# Verify the working directory has been set correctly
print(getwd())

# Load the haven package
library(haven)

# Use write_dta to save the dataframe as a .dta file
write_dta(final_aggregated_data, "./Data/Region_Germany/final_aggregated_data.dta")
```

